Generalized Hypergeometric Ensembles: Statistical Hypothesis Testing in Complex Networks
نویسندگان
چکیده
Statistical ensembles of networks, i.e., probability spaces of all networks that are consistent with given aggregate statistics, have become instrumental in the analysis of complex networks. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing. Contributing to the foundation of these data analysis techniques, in this Letter we introduce generalized hypergeometric ensembles, a broad class of analytically tractable statistical ensembles of finite, directed and weighted networks. This framework can be interpreted as a generalization of the classical configuration model, which is commonly used to randomly generate networks with a given degree sequence or distribution. Our generalization rests on the introduction of dyadic link propensities, which capture the degree-corrected tendencies of pairs of nodes to form edges between each other. Studying empirical and synthetic data, we show that our approach provides broad perspectives for model selection and statistical hypothesis testing in data on complex networks.
منابع مشابه
Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution
This paper questions some aspects of attribute acceptance sampling in light of the original concepts of hypothesis testing from Neyman and Pearson (NP). Attribute acceptance sampling in industry, as developed by Dodge and Romig (DR), generally follows the international standards of ISO 2859, and similarly the Brazilian standards NBR 5425 to NBR 5427 and the United States Standards ANSI/ASQC Z1....
متن کاملFrom Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniqu...
متن کاملMultiplex Network Regression: How do relations drive interactions?
We introduce a statistical method to investigate the impact of dyadic relations on complex networks generated from repeated interactions. It is based on generalised hypergeometric ensembles, a class of statistical network ensembles developed recently. We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex netw...
متن کاملEigenvalues and Condition Numbers of Complex Random Matrices
In this paper, the distributions of the largest and smallest eigenvalues of complex Wishart matrices and the condition number of complex Gaussian random matrices are derived. These distributions are represented by complex hypergeometric functions of matrix arguments, which can be expressed in terms of complex zonal polynomials. Several results are derived on complex hypergeometric functions and...
متن کاملMatrix averages relating to the Ginibre ensembles
The theory of zonal polynomials is used to compute the average of a Schur polynomial of argument AX , where A is a fixed matrix and X is from the real Ginibre ensemble. This generalizes a recent result of Sommers and Khorozhenko [J. Phys. A 42 (2009), 222002], and furthermore allows analogous results to be obtained for the complex and real quaternion Ginibre ensembles. As applications, the posi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1607.02441 شماره
صفحات -
تاریخ انتشار 2016